LeeWave: level-wise distribution of wavelet coefficients for processing kNN queries over distributed streams

نویسندگان

  • Mi-Yen Yeh
  • Kun-Lung Wu
  • Philip S. Yu
  • Ming-Syan Chen
چکیده

We present LEEWAVE − a bandwidth-efficient approach to searching range-specified k-nearest neighbors among distributed streams by LEvEl-wise distribution of WAVElet coefficients. To find the k most similar streams to a range-specified reference one, the relevant wavelet coefficients of the reference stream can be sent to the peer sites to compute the similarities. However, bandwidth can be unnecessarily wasted if the entire relevant coefficients are sent simultaneously. Instead, we present a level-wise approach by leveraging the multi-resolution property of the wavelet coefficients. Starting from the top and moving down one level at a time, the query initiator sends only the single-level coefficients to a progressively shrinking set of candidates. However, there is one difficult challenge in LEEWAVE: how does the query initiator prune the candidates without knowing all the relevant coefficients? To overcome this challenge, we derive and maintain a similarity range for each candidate and gradually tighten the bounds of this range as we move from one level to the next. The increasingly tightened similarity ranges enable the query initiator to effectively prune the candidates without causing any false dismissal. Extensive experiments with real and synthetic data show that, when compared with prior approaches, LEEWAVE uses significantly less bandwidth under a wide range of conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Sketching Streams Through the Net: Distributed Approximate Query Tracking

Emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over collections of physicallydistributed streams. Effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality approximate query answers. In...

متن کامل

Statistical Wavelet-based Image Denoising using Scale Mixture of Normal Distributions with Adaptive Parameter Estimation

Removing noise from images is a challenging problem in digital image processing. This paper presents an image denoising method based on a maximum a posteriori (MAP) density function estimator, which is implemented in the wavelet domain because of its energy compaction property. The performance of the MAP estimator depends on the proposed model for noise-free wavelet coefficients. Thus in the wa...

متن کامل

Continuous Distributed Stream Querying using Sketches1

While traditional database systems optimize for performance on one-shot query processing, emerging largescale monitoring applications require continuous tracking of complex data-analysis queries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying...

متن کامل

Approximate NN queries on Streams with Guaranteed Error/performance Bounds

In data stream applications, data arrive continuously and can only be scanned once as the query processor has very limited memory (relative to the size of the stream) to work with. Hence, queries on data streams do not have access to the entire data set and query answers are typically approximate. While there have been many studies on the k Nearest Neighbors (kNN) problem in conventional multid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008